SamBaTen: Sampling-based Batch Incremental Tensor Decomposition

نویسندگان

  • Ekta Gujral
  • Ravdeep Pasricha
  • Evangelos E. Papalexakis
چکیده

Tensor decompositions are invaluable tools in analyzing multimodal datasets. In many real-world scenarios, such datasets are far from being static, to the contrary they tend to grow over time. For instance, in an online social network setting, as we observe new interactions over time, our dataset gets updated in its “time” mode. How can we maintain a valid and accurate tensor decomposition of such a dynamically evolving multimodal dataset, without having to re-compute the entire decomposition after every single update? In this paper we introduce SAMBATEN, a Sampling-based Batch Incremental Tensor Decomposition algorithm, which incrementally maintains the decomposition given new updates to the tensor dataset. SAMBATEN is able to scale to datasets that the state-of-theart in incremental tensor decomposition is unable to operate on, due to its ability to effectively summarize the existing tensor and the incoming updates, and perform all computations in the reduced summary space. We extensively evaluate SAMBATEN using synthetic and real datasets. Indicatively, SAMBATEN achieves comparable accuracy to state-of-the-art incremental and non-incremental techniques, while being 2530 times faster. Furthermore, SAMBATEN scales to very large sparse and dense dynamically evolving tensors of dimensions up to 100K × 100K × 100K where state-of-the-art incremental approaches were not able to operate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Bayesian Non-negative Tensor Factorization for Massive Count Data

We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conju...

متن کامل

Tensor Train decomposition on TensorFlow (T3F)

Tensor Train decomposition is used across many branches of machine learning, but until now it lacked an implementation with GPU support, batch processing, automatic differentiation, and versatile functionality for Riemannian optimization framework, which takes in account the underlying manifold structure in order to construct efficient optimization methods. In this work, we propose a library th...

متن کامل

Kronecker Determinantal Point Processes

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of N items. They have recently gained prominence in several applications that rely on “diverse” subsets. However, their applicability to large problems is still limited due to theO(N) complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs by introducing...

متن کامل

Characterization of Deterministic and Probabilistic Sampling Patterns for Finite Completability of Low Tensor-Train Rank Tensor

In this paper, we analyze the fundamental conditions for low-rank tensor completion given the separation or tensor-train (TT) rank, i.e., ranks of unfoldings. We exploit the algebraic structure of the TT decomposition to obtain the deterministic necessary and sufficient conditions on the locations of the samples to ensure finite completability. Specifically, we propose an algebraic geometric an...

متن کامل

Sublinear Time Orthogonal Tensor Decomposition

A recent work (Wang et. al., NIPS 2015) gives the fastest known algorithms for orthogonal tensor decomposition with provable guarantees. Their algorithm is based on computing sketches of the input tensor, which requires reading the entire input. We show in a number of cases one can achieve the same theoretical guarantees in sublinear time, i.e., even without reading most of the input tensor. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1709.00668  شماره 

صفحات  -

تاریخ انتشار 2017